% Copyright 2018 Google LLC
%
% Use of this source code is governed by an MIT-style
% license that can be found in the LICENSE file or at
% https://opensource.org/licenses/MIT.

%!TeX spellcheck = en-US

\documentclass[eprint.tex]{subfiles}
\begin{document}
\section{Design}
\parintro{Three-pass structure}
Any secure PRP must have a pass that reads all of the plaintext, followed by a pass that modifies
it all. A secure SPRP must have the same property in the reverse direction;
a three-pass structure therefore seems natural.
$\epsilon$-$\Delta$U functions are the fastest options for reading the plaintext in a
cryptographically useful way, and stream ciphers are the fastest options for modifying it.
$\epsilon$-$\Delta$Us
are typically much faster than stream ciphers, and so the hash-XOR-hash structure emerges as
the best option for performance. This structure also has the advantage that it naturally handles
messages in non-round sizes; many VIL modes need extra wrinkles akin to ciphertext stealing
to handle the case where the message is not
a multiple of the block size of the underlying block cipher.

\parintro{Block cipher}
\cite{luby-rackoff} observes that a three-round Feistel network cannot by itself be a secure SPRP;
a simple attack with two plaintexts and one ciphertext distinguishes it. A single block cipher call
in the narrow part of the unbalanced network suffices to frustrate this attack; the
larger the message, the smaller the relative cost of this call. If the plaintext is exactly $n$ bits
long, the stream cipher is not used, and the construction becomes
$C = E_{K_E}(P \boxplus H_{K_H}(T, \lambda)) \boxminus H_{K_H}(T, \lambda)$
as per Subsection 3.1 of \cite{tweakable}.
Compared to HCTR~\cite{hctr} or HCH~\cite{hch}, we sacrifice
symmetry of encryption with decryption in return for
the ability to run the block cipher and stream cipher in parallel when decrypting.
For disk encryption, decryption performance matters most:
reads are more frequent than writes, and reads generally affect user-perceived latency, while
operating systems can usually perform writes asynchronously in the background.

\parintro{Components}
It's unusual for a construction to require more than two distinct primitive components.
More commonly, a hash-XOR-hash mode uses the block cipher to build a stream cipher
(eg using CTR mode~\cite{ctr}) and also uses it directly on the narrow side of the message.
Using XChaCha12 in place of a block cipher affords a significant increase in performance;
however it cannot easily be substituted in the narrow side of the cipher.
\cite{sarkar1,sarkar2,sarkar3,sarkar4} use only an $\epsilon$-$\Delta$U function
and a stream cipher, and build a hash-XOR-hash SPRP
with a construction that uses a four-round Feistel network over the non-bulk side of the data
broken into two halves. However if we were to build this using XChaCha12,
such a construction would require four extra invocations of ChaCha per message, which would be
a much greater cost than one block cipher invocation.

\parintro{KDM security}
We do not consider an attack model in which derived keys are presented as input.
Length-preserving encryption
which is KDM-secure in the sense of~\cite{kdm} is impossible, since it is trivial for the
adversary to submit a query with a $g$-function
that constructs a plaintext whose ciphertext is all zeroes.
Whether there is a notion of KDM-security that can be
applied in this domain is an open problem. Users must take care to protect the keys from being
included in the input.

\parintro{Stream cipher}
Users are highly sensitive to the performance of disk encryption; an
extra microsecond decrypting the contents of a sector can mean many users
forgoing encryption altogether.
eBACS~\cite{supercop} tests a wide variety of stream ciphers on a wide variety
of architectures; ChaCha12 is consistently one of the
fastest options for the ``armeabi'' (32-bit ARM) architecture.
ChaCha and its predecessor Salsa20
have seen intense cryptanalysis in the decade or so since publication
\cite{tdcs20,nonrandomsalsa,tsunoo,latindance,ishiguro2011,ishiguro2012,zhenqing2012,
maitra2015,chachamaitra,choudhuri2016,dey2017,Choudhuri_Maitra_2017,chacha2018};
the best attack breaks 7 rounds, a landmark reached with \cite{latindance} in 2008.
Each round greatly increases the difficulty of attack.
We therefore feel confident selecting the 12-round variant as giving
good confidence in security while minimizing the cost to users.

\parintro{Hash function}
Since the $\epsilon$-$\Delta$U is run twice over the bulk of the message, its speed is especially
crucial for large messages. One of the fastest such functions in software is NH, and
it's also appealingly simple; however as discussed in~\autoref{nh} it generally has to be
combined with a second hashing stage, and for this purpose we use Poly1305. The 1KiB input size
we specify for NH means that a simple, portable implementation of Poly1305 can be
used without a great cost in speed; in contrast, for HPolyC a vectorized
Poly1305 implementation is important.
We considered using UHASH (as defined for UMAC~\cite{rfc4418}) rather than our
custom combination of NH and Poly1305; however, available UHASH implementations
are not constant-time, and a constant-time implementation would be significantly
slower.

\parintro{Key agility}
For the 4KiB messages of disk encryption,
the ~1KiB NH key size has only a small impact on key agility. Applications
that need high key agility even on small messages may instead use HPolyC, which
uses Poly1305 directly. The main
cost of a new HPolyC key is a single XChaCha12 invocation to generate subkeys.
ChaCha12 has no key schedule
and makes no use of precomputation; XChaCha12
requires one extra call to the ChaCha permutation for each new nonce.
No extra work is required for differing message or tweak lengths for either Adiantum
or HPolyC.

\parintro{Constant-time}
NH, Poly1305 and ChaCha12 are designed such that the most natural fast implementations are
constant-time and free from data-dependent lookups. So long as the block cipher implementation
also has these properties, Adiantum and HPolyC will inherit security against
this class of side-channel attacks.

\parintro{Parallelizability}
NH, Poly1305 and ChaCha12 are highly parallelizable.
The stream cipher and second hash stages can also be run in combination for a total
of two passes over the bulk of the data, unlike a mode such as HEH~\cite{heh}
which requires at least three.
We put the ``special'' part on the right so that in typical uses the bulk encryption has
the best alignment for fast operations.

\parintro{Naming}
``Adiantum'' is the genus of the maidenhair fern, which in the language of
flowers (floriography) signifies sincerity and discretion.~\cite{fleurs}

\subbib
\end{document}
